Model Selection

Document Image Analysis

# Document Image Analysis

Qwen2.5 VL 7B Instruct Quantized.w4a16

Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, with weights quantized to INT4 and activations to FP16.

Transformers English

Paligemma2 3b Ft Docci 448

PaliGemma 2 is an upgraded vision-language model released by Google, combining the capabilities of Gemma 2 and SigLIP vision models, supporting multilingual vision-language tasks.

Sd3 Long Captioner V2

A fine-tuned image-to-text generation model based on PaliGemma 224x224 version, specializing in generating detailed descriptions for artistic images

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase